Five centuries of consanguinity, isolation, health, and conflict in Las Gobas: A Northern Medieval Iberian necropolis

Between the 8th and 11th centuries CE, the Iberian Peninsula underwent profound upheaval due to the Umayyad invasion against the Visigoths, resulting in population shifts and lasting demographic impacts. Our understanding of this period is hindered by limited written sources and few archaeogenetic studies. We analyzed 33 individuals from Las Gobas, a necropolis in northern Spain, spanning the 7th to 11th centuries. By combining archaeological and osteological data with kinship, metagenomics, and ancestry analyses, we investigate conflicts, health, and demography of these individuals. We reveal intricate family relationships and genetic continuity within a consanguineous population while also identifying several zoonoses indicative of close interactions with animals. Notably, one individual was infected with a variola virus phylogenetically clustering with the northern European variola complex between ~885 and 1000 CE. Last, we did not detect a significant increase of North African or Middle East ancestries over time since the Islamic conquest of Iberia, possibly because this community remained relatively isolated.


INTRODUCTION
During the medieval period, the Iberian Peninsula became a melting pot of ideas and lifestyles.The region witnessed the convergence of various cultures, religions, and ethnicities creating a rich and diverse social tapestry.The genetic impact of the interaction between these diverse groups is underexplored.Recent data provided a first glance of the influence of Islamic conquest on the Iberian gene pool (1).However, the timing and scale of the processes influencing the genetic makeup of the Iberian population over time and across different regions remain uncertain, with a notable dearth of archaeogenetic studies focusing on the Middle Ages in the Iberian Peninsula.
Migrations between North Africa and the Iberian Peninsula have been documented since at least the Neolithic (2)(3)(4).The Phoenician, Greek, and Carthaginian colonies, followed by the Roman Empire, played crucial roles in amplifying the genetic exchange from North Africa to the Iberian Peninsula, especially along its southern and eastern shores (1,5).The arrival of Berber populations from North Africa during the Umayyad Caliphate's conquest of the Iberian Peninsula (711 to 726 CE) exerted a pronounced genetic influence on most parts of Iberia (1,6).Between 711 and 1492 CE, diverse Islamic states characterized by distinct political and ethnic origins ruled different extensions of Iberia marked by a continual shifting of borders with the emerging Christian kingdoms in the north (7)(8)(9).Despite the geopolitical and sociocultural boundaries remaining strong between Christians and Muslims (Fig. 1) (10), the impact of the North African gene flow during this period remains traceable today (11).Bycroft et al. (11) revealed that the North African ancestry is present in all current Iberian populations, with lower levels in the Basque region and areas corresponding to the 14th century Crown of Aragon.However, this ancestry does not correlate with proximity to North Africa or extended Islamic control.The highest levels, up to 11%, are found in Galicia and Portugal (11).
Situated in northern Spain, the rural necropolis of Las Gobas (Fig. 1) spans a time transect from the 7th to the 11th centuries, encompassing individuals who lived both before and during the Islamic control of most of Iberia.This offers the opportunity to study how the dynamic regional development was mirrored in a small rural community throughout a key period in shaping the genetic structure of the Iberian Peninsula.Specifically, it allows for an indepth exploration of the genetic impact of sociocultural conflicts and population interactions within this rural medieval Christian community.Furthermore, it enables us to investigate the dissemination of pathogenic microbes during these interactions and glean insights into the potential diseases prevalent among these populations, shedding light on their way of life.Archaeogenetics, by filling critical gaps in information for this period and region, becomes a powerful tool in unraveling aspects of ancestry, health, and social interactions influenced by historical events.
Las Gobas is part of the extensive complex of medieval cave settlements in Iberia, a topic of long-standing debate concerning their temporal origins and intended purposes in European historiography.The designation "Las Gobas" refers to a cluster of 13 artificial caves meticulously carved into a rocky outcrop, spanning from the mid-6th century to the 11th century CE (12,13).This cluster includes two churches and variously sized single-room cavities.
Archaeological investigations (14) reveal that the necropolis consists of two distinct periods.Phase I (seventh-ninth centuries CE) is characterized by mixed habitation of cave architecture and free-standing structures, featuring a church (Las Gobas-6) and a cemetery containing 22 burials (Fig. 2).This phase, marked by the construction of new domestic structures outside the church, shows an active habitat with clear evidence of daily life and activities.
In phase II (10th-11th centuries CE), the settlement shed its residential character, preserving only its role as a cemetery and place of worship.A total of 19 burials have been unearthed from this latter period (Fig. 2 and table S1).Phase II begins with the abandonment of the site as a living area, as indicated by sedimentation layers and the remains of an adult cow found in a silo repurposed as a waste dump.Radiocarbon dating places this transition in the second half of the ninth century CE.The shift from a residential area to a funerary space and the eventual relocation of inhabitants to a new settlement in Laño clearly demarcate the two phases in the history of this cemetery at Las Gobas (14).Fig. 2. Kinship and burial distributions at Las Gobas.We represent here one of the possible pedigrees of the related individuals according to their autosomal, uniparental, archaeological, and anthropological data (see tables S1 to S4). the inbreeding information is listed in table S5, while the isotopic and osteological information was taken from (15,16), respectively.each bacteria species is uniquely represented by a distinct color. in addition, the variola virus is symbolized with a virus symbol. in the individual with a direct radiocarbon date, we display the 2-sigma date interval.
Earlier bioarchaeological investigations have underscored the distinctive nature of this archaeological site.Noteworthy among these are isotopic studies suggesting a shift in dietary practices over time and moderate mobility patterns (15).Particularly noteworthy is a recent work (16), which focuses on examining the skeletal remains of adult males showcasing severe cranial injuries inflicted by swords, indicative of past violent episodes.
Here, we analyze the genomes of 33 individuals buried in Las Gobas, which encompass five centuries of political, demographic, and social changes in the Iberian Peninsula.Using an archaeogenetic strategy, we delve into the genetic origin, consanguinity, and kin relations of these individuals.Furthermore, we pioneer the exploration of the infectious diseases landscape within an early Medieval Iberian community using a metagenomic approach.Last, we conducted absolute radiocarbon dating that allows us to shed light on two individuals from the first phase of the necropolis showing clear evidence of trauma generated by a violent episode.Our results are integrated in a holistic perspective with previous components such as osteological and isotopic analyses and thus combinedly will help to understand the origin and functionality of the site.

Genomic analyses
We generated whole-genome sequence data for 48 skeletal elements belonging to 37 individuals.Four individuals with a coverage <0.01× were removed, and the remaining 33 were sequenced to a coverage of 0.015× to 5.37× (median = 1, mean = 1.27) (table S2).All sequenced genomic libraries presented short fragment length, clear damage patterns, and negligible levels of contamination, indicating that the data generated here are endogenous and authentic (table S2).

Sex identification and uniparental markers
We genetically identified the biological sex of each of the 33 individuals following the method described in (17), with clear assignments obtained for all individuals as 11 females (XX) and 22 males (XY) (table S2).Of the 22 males, all but three showed the Y chromosome haplogroup R1b1a1b (R-M269) or one of its related subbranches (table S3), a genetic lineage that has been prevalent in Europe since the Bronze Age (18,19).The specific haplogroup of two of the three individuals could not be determined due to limited data, while individual 37 was found to have the haplotype L1b1, which is relatively uncommon in Europe and can be found in Middle Eastern populations today (20).On the other hand, the mitochondrial haplogroups exhibited higher diversity, representing all major European lineages including H, J, I, U, K, HV, and T (table S2).

Violence, kinship, and extended family structures
From phase I of the necropolis (direct radiocarbon dated between 610 and 907 cal CE), 21 burials have been recovered, of which 14 are males, 5 are females, and 2 are infants.We retrieved genomic data from 15 of these individuals (11 males and 4 females) and included them in the kinship analyses.Our results indicated that only six of them were not related (≥4 degree) to any other individual analyzed here.Among the related individuals, we found a first-degree relation between two males, individual 1 and individual 28, with the kinship results suggesting that they were full siblings (table S4).These potential brothers are at least third-degree related to individual 22.
Individuals 22 and 28 present several traumas caused by sword on their skulls (see fig.S1, A and B).Osteological analysis suggests that at least one of them (individual 28) died from these injuries (fig.S1B) (16) (see notes 3 and 5 in table S1).All this points to violent episodes during an early phase of occupation of Las Gobas as both individuals have the oldest direct radiocarbon dates (566 to 648 cal CE; table S1).
Individual 22 was also a third-degree relative of individual 34.Another pair of males, individuals 26 and 35, were also third-degree related to each other, probably cousins on the paternal side (Fig. 2 and table S4).All of them were associated with the oldest layer in the necropolis.Note that some burials functioned as hinges between phase I and phase II, suggesting continuity between both phases.This is well represented by the only family trio in which individuals from phase I (33 and 31) had a son (individual 40) associated with phase II.In addition, individual 23 (from phase I) is a third-degree relative of individual 41, who is also part of phase II (Fig. 2 and table S4).
We documented an increase of genetically related individuals during phase II (direct radiocarbon date from 979 to 1036 cal CE) when the Las Gobas community relocated to the valley floor as a farming community.From this phase, there were a total of 19 burials: 11 males, 3 females, and 5 infants.We retrieved genomic data from 18 of these individuals (11 males and 7 females) with at least 15 of them related to other individuals from Las Gobas.This included a three-generation family with genomic data from seven members, two of them (individual 2 and 102) are also second or third degree related to individual 7A from another family group (Fig. 2 and table S4).Two brothers having children with two sisters would be one of the possible pedigrees for this family that agrees with our uniparental and kinship results and also with the available archaeological, radiocarbon, and osteological data (Fig. 2 and table S1).We noted that due to the low coverage of individuals 2 and 102, the kinship relation between them could not be determined (Fig. 2).Although they overlap in more than 10,000 Single Nucleotide Polymorphisms (SNPs) with the other individuals, the relations involving these individuals should be taken with caution.However, they could probably be brothers as they seem equally related to the rest of the individuals (table S4).

Consanguinity
The frequency and length of runs of homozygosity (ROH) rise in correlation with the level of genealogical proximity between the two parents allowing for investigating the degree of inbreeding (21).We use the software hapROH (22), which identifies ROH in individuals with coverage ≥0.3× using reference haplotypes, to calculate and visualize the degree of consanguinity in all the Las Gobas and reference ancient Iberian individuals with >400,000 SNPs of the 1240K (23) sites covered (table S5).Several individuals presented high levels of consanguinity, especially in the first phase of Las Gobas.ROH between 4 and 8 centimorgan (cM) suggests ancestral relatedness in both phases (Fig. 3A and table S5).We also found three individuals (31, 34, and 38) from phase I and individual 41 from phase II with long ROH fragments (Fig. 3, A and B).The sum of ROH fragments > 20 cM suggest that the parents of these individuals were closely related (Fig. 3B).Our results indicate that within the reference ancient Iberian individuals (table S5), levels of inbreeding (sum of ROH fragments > 20 cM) were also present in one Iberian Islamic individual (I7457) (table S5).

Metagenomic analyses
In total, we identify six pathogens in the population of Las Gobas (table S6).Four pathogenic species were detected based on aMeta's authentication scores (24): Erysipelothrix rhusiopathiae, Leptospira interrogans, Streptococcus pneumoniae, and Yersinia enterocolitica.Since viruses might be harder to detect due to low coverage, and because database configuration can affect the number of taxReads assigned to the species level, we investigated the k-mers/reads ratio of pathogenic organisms detected with KrakenUniq (25) to assess their presence (see Materials and Methods).We further identified two additional pathogens namely Borrelia recurrentis (ratio = 16.6) and variola virus (ratio = 7.9) (table S6) under the aMeta's default thresholds (200 taxReads and 1000 k-mers).Because of database configuration in National Center for Biotechnology Information (NCBI), B. recurrentis only exhibited 2 reads at the species level (taxReads), but actually 359 when encompassing the lower taxonomic nodes and 5958 k-mers.The variola virus was just under the thresholds with 195 taxReads, 202 reads, and 1597 k-mers.Given its historical interest, individual 43 infected with variola virus was subjected to another round of shotgun sequencing.
To identify potential host-associated microbes and pathogens mentioned above using aMeta, we used an elimination process.Authentication scores to evaluate true presence and ancient status were generated by the aMeta pipeline (24) based on nine metrics (edit distance, edit distance of ancient reads, deamination on both read ends, mean read length, postmortem deamination score, average nucleotide identity, amount of reads, and evenness of coverage, the latter adding two points).Through this method, we detected 56 microbial organisms with an authentication score of ≥8 (see Materials and Methods and figs.S2 and S3).However, 41 were filtered out (fig.S3) as they could not be disentangled from environmental contamination.Among the remaining 15 (fig.S2), 12 were associated with the human oral microbiome (26-28) (among which one is a potentially asymptomatic pathogen), and the remaining 3 were pathogenic.
S. pneumoniae, the potentially asymptomatic pathogen, was authenticated in individuals 44, 46, and 47 (figs.S4 to S6) but could not be validated in individual 43 based on authentication plots.This bacterium, present in the oral microbiome of about 10% of adults and more children, can cause respiratory infections, pneumonia, and bacteremia (29).
The three remaining identified pathogenic species by aMeta are E. rhusiopathiae, confirmed in individuals 28, 32, 34, and 37 from phase I and in individual 7A from phase II (range of breadth of coverage 2.26 to 92.44%), L. interrogans in individual 43, and Y. enterocolitica in individual 32 (tables S6 to S8 and figs.S2 and S7 to S14).These pathogens are responsible for distinct diseases: erisypeloid, a skin infection transmitted through wound contact with infected animal material; leptospirosis, a potentially fatal disease leading to organ failure, typically contracted from contact with contaminated water; and yersiniosis, a severe form of gastroenteritis, respectively.The consistent coverage of genes associated with virulence, such as inv, yst, and myf, supports the pathogenic nature of the identified Y. enterocolitica (figs.S15 to S17).Competitive mapping further validates our results for these three pathogens (table S7).
In the case of E. rhusiopathiae, two of the five candidates had confident high coverage (47.7 and 92.4%), while the remaining three had low coverage (less than 10%).The latter three samples were included in the phylogenetic analysis to test their phylogenetic signal leading to a clear E. rhusiopathiae origin (see Materials and Methods).Overall, all samples formed a well-supported monophyletic clade [ultrafast bootstrap support (100%)/approximate Likelihood-Ratio Test (aLRT) (87%)] clustering with an E. rhusiopathiae pig sample from Russia (GCF_003725505.1)(fig.S18).However, to infer a common origin for the five samples, we independently tested the positioning of each of the five branches (table S8).In three of five cases, the Las Gobas sequence was positioned next to the same pig sequence (GCF_003725505.1),indicating a single origin for at least these three Las Gobas samples (37, 34, and 32).For the other two (7A and 28), further capture sequencing would be required to confirm the E. rhusiopathiae specific subbranch origin with confidence.
We note that, although E. rhusiopathiae samples exhibited a declining edit distance (figs.S7 to S11), numerous reads had a high number of mismatches (4 to 10) for several individuals.This could be

of 15
due to ancient damage and/or false comapping of an additional strain.The latter can be attributed to the fact that the closest strain from the Las Gobas E. rhusiopathiae, present in the aMeta database, was quite distant in the phylogenetic analysis, comprising more E. rhusiopathiae samples.While the MEGAN Alignment Tool (MALT) (30) analyses classified the E. rhusiopathiae GCF_900637845.1 as the closest strain, the genome GCF_003725505.1, not present in the aMeta database, was the closest in the phylogeny (fig.S18).
Investigating under the aMeta threshold, we identified two additional pathogens, B. recurrentis, responsible for the louse-borne relapsing fever in individual 12, and an ancient variola virus (aVARV), the causative agent of smallpox in individual 43 (figs.S19 to S21).Both diseases have a high mortality rate, and B. recurrentis is still present in different parts of the world today, like the Horn of Africa, while smallpox has been eradicated (31).Competitive mapping supported the assignment of the reads to B. recurrentis in comparison to other pathogens of the Borrelia genus (table S7).A comparative edit distance between several reference genomes showed that the aVARV detected here was closest to the only available (almost) full consensus sequence for aVARV (VK382) (fig.S22) (32).After the second round of sequencing, we recovered 4228 reads for aVARV, summing up to a 53.4% breadth of coverage and 0.94× coverage when mapping to VK382.
To determine the phylogenetic position of aVARV, we constructed a tree incorporating the conserved regions of 90 poxviruses (see Materials and Methods) (Fig. 4 and fig.S23).The final alignment consisted of 103,504 sites and 8117 parsimony-informative sites.A total of 56,352 sites were covered by the aVARV from individual 43.While the results warrant cautious interpretation due to the low coverage of our sample, it shows that it clusters robustly with other ancient VARV strains identified in individuals buried in northern Europe between ~885 and 1000 CE (Fig. 4), regardless of the reference genome we align it to, be it VK382 (fig.S23) or the closest related virus to variola, the taterapox virus (fig.S24).

Genetic structure
We projected individuals from Las Gobas together with previously published ancient Iberian individuals (table S9) onto the first two principal components of modern West Eurasians and North African populations using the dataset 1240K+HO v.54.1.p1(Fig. 5) (23).The principal components analysis (PCA) indicated genetic similarity among the two different phases of the necropolis, with most of the individuals placed within the range of the genetic variation observed in modern Spanish populations (Fig. 5).Except for two individuals displaying negative PC1 values (in the direction modern north Africans) and clustering near Islamic Iberian, south Iberian Visigoths, and Iberian Romans, the remaining individuals cluster close to Iron Age and early Medieval north Iberians (Fig. 5).
The fact that the individuals classified as south Iberian Visigoths from the necropolis el Castillo de Montefrío (Granada) already presents high levels of North African ancestry (29%) (Fig. 6 and table S10) while being dated between 400 and 600 CE (several generations before the Umayyad conquest of Iberia) requires further explanation (1,34).Gene flow between Visigoths in southern Iberia and populations with high levels of North African ancestry (e.g., Roman elites), combined with the acculturation of the latter group, may explain the high degree of North African ancestry in early Visigoths of southern Iberia.Another explanation for the notable North African ancestry observed in some south and south-east Iberian individuals could be the earlier Punic expansion into Iberia.In addition, the long history of interactions between North African Berber communities and southern Iberia, including gradual settlement and mercantile activities, likely played an important role in shaping of the genetic landscape.This perspective is supported by extensive scholarship examining the complex processes of Islamization and settlement, suggesting gradual contacts before the Umayyad conquest of Iberia (35)(36)(37)(38).Further archaeological and genetic research is needed to clarify these points and to explore other possible scenarios.
The two previously mentioned outliers in the PCA (individual 37, a male with the L1b1 Y chromosome haplotype from phase I dated to 773 to 892 CE, and individual 7B, a female from phase II older than 993 to 1050 CE; table S2) present some of the highest levels of North African component (23 and 13%, respectively) in the admixture analyses at K = 5 (table S10).The Umayyad conquest of southern and central parts of Iberia (711 CE) predate both individuals.
We conduct qpAdm analyses (19,39) to formally assess the extent of North African ancestry in the Las Gobas individuals.In Fig. 6B, we present the results of a two-source model, using Visigoths or Iron Age individuals from northern Iberia (1) and Canary Islands aborigines populations (40,41) as proxies for early Medieval northern Iberians and North Africans, respectively.This analytical approach enables us to incorporate related individuals, considering the independence of each ancient individual in the analysis.
In general, the qpAdm results align with the ADMIXTURE results, revealing relatively low levels of North African (Canary Islands aborigines) ancestry, with some individuals showing none at all (table S11).However, it is crucial to interpret these findings cautiously, as Canary Islands aborigine populations may not accurately represent the Medieval North African populations that conquered Iberia due to continuous gene flow from diverse populations in North Africa since the Canary islands were populated (40)(41)(42).Note that our qpAdm analyses exclude a population representing the Middle East/Caucasus, in contrast to the ADMIXTURE analysis where such populations were included.The exclusion is based on the unviability of models incorporating Middle East/Caucasus for most individuals.This divergence in model inclusion may contribute to potential inflation of Guanche percentages compared to ADMIXTURE results for specific individuals (see Fig. 6B).To further investigate the influence of the Islamic expansion on individuals from Las Gobas, we examined the correlation between individuals' radiocarbon ages and ancestries calculated with both methods ADMIXTURE (Fig. 7A) and qpAdm (Fig. 7B).Our findings indicate that there is no significant correlation between the age of individuals and the varying levels of these ancestries (Fig. 7).
Genomic analyses of individuals from Las Gobas, dating back to periods both before and during the Umayyad conquest, indicate that this historical event did not result in a significant increase in nonlocal genetic ancestry within this rural community over time (Fig. 7).However, we note that the six individuals exhibiting over 10% North African-like ancestry (table S10) postdate the period following the Islamic expansion in Iberia.These findings suggest that even within this North Iberian isolated rural population, there were periods of sporadic genetic exchange with populations carrying North African ancestry, with a substantial portion likely occurring during the Umayyad Caliphate.However, the detection of minor levels (<10%) of North African ancestry in certain individuals from Las Gobas predating the Islamic conquest of Iberia, coupled with the observation that Roman Iberians, and although in a  S11).european and north African components are distinguished by color (purple and orange, respectively).For individuals 35 and 47, iron Age populations from north iberia were used instead of northern iberian Visigoths as the latter did not provide a feasible model.error bars indicate the upper and lower Ses. the P value for each analysis is displayed on the plot.limited proportion (~4%) even Iron Age Iberians, also exhibit North African ancestry, prevents us from conclusively determining the timing and source of the North African genetic influence on individuals from Las Gobas.
To test population affinities between the individuals from Las Gobas and other groups, we divided the ancient individuals in the two different phases, removed one individual from each pair of related individuals, and performed f-statistics analysis.The multidimensional scaling (MDS) of pairwise f 3 results indicated that individuals from the two phases shared most genetic affinities (Fig. 8 and table S12), suggesting that Las Gobas forms a homogeneous population, likely as a result of population continuity across the centuries with relatively low levels of nonlocal gene flow.Dimension 1 of the f 3 MDS plot is probably mainly driven by an increase in North African and Middle East ancestries from left to right.

DISCUSSION
The borders of the different Islamic and Christian governments fluctuated drastically during the almost eight centuries that parts of Iberia were under Islamic rule (7).Over extended periods, the frontier with the Islamic-ruled territories remained proximate to Las Gobas (Fig. 1B), with well-documented Islamic influence extending as far as Pamplona, located 100 km to the northeast (43,44).However, the individuals buried at Las Gobas present levels of North African ancestry comparable to pre-Roman Iron Age Iberians, north Iberian Visigoths, and Carolingians (groups that did not have apparent interactions with North African populations).Notably, these levels are significantly lower than those observed in individuals from Roman sites, south Iberian Visigoths, and Islamic Iberians.This observation implies that the Roman influence before and the Islamic influence during the five centuries covered by the Las Gobas record did not have a substantial  impact on the genetic composition of this small Christian community.These findings agree with the historical sources and modern genetic data (45), which suggest that the northern regions of the Iberian Peninsula, particularly the Basque region, experienced comparatively lower genetic influence from North African populations during the Middle Ages and the Roman era.However, it is important to note that Roman Iberians used here as comparative data refer to individuals found in Roman sites within Iberia (1) and probably represent the diversity of the vast Roman Empire populations rather than local Iberians during the Roman Age (table S9).
The presence of consanguinity and violence in the early stages of the necropolis, together with the relatively small variations observed in the Y chromosome, leads to speculation about the possibility that the site was populated in the seventh century CE by a small patrilocal endogamous group that could be, for example, members of an elite with military experience.Conversely, changes in the second phase, where we have identified more kinship connections and an absence of evident sword injuries, suggest that the site has probably changed its character to that of a rural farming necropolis.In addition, there is a slight decrease in the number of individuals with high levels of inbreeding in this later phase of the Necropolis (Fig. 3 and table S5), which may reflect changes in social organization.However, we have documented genetic continuity between the two phases, and our results suggest the practice of endogamy and close interactions with animals over the five centuries under study.The geographic location of Las Gobas, surrounded by mountains and relatively distant from urban centers, likely contributes to the community's isolation.The observed high levels of consanguinity among some individuals from Las Gobas indicate cultural practices of endogamy, which could also limit the gene flow from neighboring communities.This local isolation may, over time, also result in reduced gene flow from Arabs and North Africans on a broader scale.
The metagenomic results suggest close interactions with animals, as the majority of infectious diseases detected within the Las Gobas community were zoonoses.The most common was E. rhusiopathiae, a skin infection that usually arises when handling meat while having open cuts (46,47).Two hypotheses can explain its presence in Las Gobas human remains: (i) It originated from the necrobiome; however, studies suggest limited soil persistence (48,49); (ii) it was a zoonotic transmission from infected animals, which is a more likely scenario.However, while the closest taxa from the five samples was a porcine strain from Russia, the origin of the host remains unclear as no strict host specificity has been described for this species (41).
The discovery of Y. enterocolitica, a food-borne pathogen, agent of yersiniosis and often associated with undercooked pork consumption, indicates the presence of this pathogen in at least one individual consistent with previous findings in ancient human remains (50,51).Other zoonotic pathogens detected in Las Gobas were B. recurrentis, transmitted by the human body louse, and leptospirosis, caused by L. interrogans.Leptospirosis is generally acquired through contact with water contaminated by the urine of infected animals.
In addition to exploring the genetic composition and health conditions of the Las Gobas community, our study confirms the pan-European distribution of smallpox in the Middle Ages (32) and provides valuable insights into the intriguing historical puzzle of how it entered Europe.Smallpox, with its devastating impact, poses a complex challenge in tracing its origins and spread throughout the continent.Given the infectivity and long incubation time of the disease, it is likely that smaller-scale events might have brought it to different parts of Europe at different times.However, if the spread of smallpox intensified with increased human mobility, then specific scenarios can be explored in the light of recent archaeogenetic findings.For example, the presence of aVARV in individuals buried in northern Europe during the seventh century (32,52), before the Viking Age and predating the Umayyad invasion of 710 CE, suggests that the virus was already present in Europe before the Islamic conquest of Iberia.Previous hypotheses that linked the presence of the virus in Europe with the migrations from the east around the fifth and sixth centuries better fit the archaeogenetic data (32,53).The phylogenetic clustering of the aVARV found in the individual 43 from Las Gobas with those discovered in northern European individuals, dating back to between 885 and 1000 CE, suggests that the virus may have reached the Las Gobas individual via Europe rather than through Islamic routes, which was one of the dominant ideas for how smallpox entered Iberia (54).Whether the Umayyad invasion was an additional factor in the spread of smallpox will eventually be clarified with early medieval VARV genomes from individuals buried in North Africa and the Middle East.
The individual infected with the aVARV virus was also afflicted by leptospirosis.Coinfections involving two different pathogens, like the one observed in individual 32 with both E. rhusiopathiae and Y. enterocolitica, were likely as common in the past as they are today (55).
By integrating genetic, archaeological, and historical data, we unveil the presence of an endogamous and consanguineous community in Northern Iberia.Despite centuries of turbulent regional history, this community experienced limited external gene flow.Our findings indicate that this community remained relatively isolated for at least five centuries.Last, the metagenomic analyses of the Las Gobas individuals integrated with the rest of the information gathered in this study complements the understanding of the health condition of this rural community.By focusing on site-specific pathogens, we provide a detailed view of the disease landscape at a specific time and place while also highlighting previously unreported ancient pathogens, such as E. rhusiopathiae and L. interrogans.To corroborate certain assertions made in this study and pinpoint the source and timing of North African ancestry in Iberian Peninsula populations across various epochs, additional ancient genomes from local Iberians during the Roman period, and particularly North African genomes spanning the Roman era to the 11th centuries CE, will be essential.
Last, while we can identify the detected pathogens and confirm their ancient status, their low coverage sometimes limits our capacity for further phylogenetic and gene evolution analyses.In addition, in L. interrogans, the observed nonperfectly decreasing edit distance pattern may reflect evolutionary differences between the modern reference genome and the ancient strains and/or the lack of appropriate reference genomes in the database.We note that despite the various statistics and quality controls shown in the study supporting the presence of B. recurrentis in individual 12 and L. interrogans in individual 43, the low coverage prevents us from confirming their presence with absolute confidence.

DNA extraction
The human remains were sampled in the aDNA facilities at the Archaeological Research Laboratory, Stockholm University (Sweden).
All samples were decontaminated before analysis with ultraviolet irradiation (6 J/cm 2 at 254 nm).After removing the surface, bone was drilled to powder, and the root tip of the teeth was cut with a multitool drill (Dremel) to get approximately 80 to 150 mg of bone powder/root tip.
The root tip samples were placed in an Eppendorf tube in 1 ml of predigestion buffer [0.45 M EDTA (pH 8.0)] at 37°C in a hybridization oven with rotation.After 30 min, the supernatant was removed to reduce the exogenous DNA contamination.Following this predigestion step, 1 ml of extraction buffer [0.45 M EDTA (pH 8.0) and proteinase K (0.25 mg/ml)] was added to all the samples, and they were incubated at 37°C in the hybridization oven with rotation for 1 to 4 days until all powder/root tip was dissolved.The extraction was conducted following (56), with 1 ml of digested extract being combined with 13 ml of binding buffer containing 5 M guanidine hydrochloride, 40% (v/v) isopropanol, 0.05% Tween 20, and 90 mM sodium acetate (pH 5.2).Fifty-milliliter silica columns (Roche, High Pure Viral Nucleic Acid Large Volume Kit) were used for purification, and the DNA was eluted in 45 μl of Elution Buffer (EB; Qiagen).Blank controls were used along all extraction steps.

Library preparation and sequencing
Twenty microliters of extract was used to prepare blunt-end ligation DNA libraries coupled with P5 and P7 adapters and double indexes as described in (57).Blank controls were used during every step of library preparation and amplification.The optimal number of polymerase chain reaction (PCR) cycles for library amplification was determined with quantitative PCR.The amplification reactions had a total volume of 50 μl, with 5 μl of DNA library, and the following in final concentrations: 1× AmpliTaq Gold Buffer, 2.5 mM MgCl 2 , 25 μM of each deoxynucleotide triphosphate, 2.5 U of AmpliTaq Gold (Thermo Fisher Scientific, Waltham, MA), and 200 nM of each of the index primers (Meyer and Kircher, 2010).PCR was done with the following conditions: an activation step at 94°C for 10 min followed by 8 to 20 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 45 s and a final elongation step of 72°C for 10 min.Four amplification reactions with the same indexing primer were made for each library to increase complexity.Last, the amplified libraries were pooled and purified with AMPure XP beads (Agencourt; Beckman Coulter, Brea, CA), and the fragment size and concentration were checked using BioAnalyzer with the High Sensitivity Kit (Agilent Technologies, Cary, NC).

Processing and alignment of sequencing reads
Purified libraries were pooled in equimolar concentration and sequenced on an Illumina HiSeq X10 and/or on NovaSeq 6000 at the SciLifeLab National Genomics Infrastructure in Stockholm.Sequencing reads were demultiplexed according to the index of each sample sequence.Cutadapt v. 2.3 was used for trimming adapters and FLASH v. 1.2.11 for merging of fastq reads.The reads were mapped against human reference genome build 37 (hs37d5) using the Burrows-Wheeler Algorithm, as implemented by Burrows-Wheeler Alignment (BWA) v. 0.7.17, (58) with the following parameters aln (-l 16500 -n 0.01 -o 2).The fastq files from different sequencing runs of the same library were merged using Samtools v. 1.17.merge option (59).Next, a slightly modified version of FilterUniqueSAMCons.py (60) was used to condense the reads with identical start and end position into a consensus read.Last, reads shorter than 35 bp and reads with less than 90% identity with the reference were filtered out using percidentity_threshold.py (61).Because of the low coverage of the ancient samples, we built pseudo-haploid genomes by randomly choosing one read with a minimum mapping quality of 30 and a base quality of 30.

Data validation
All libraries yielded short read lengths, and patterns of cytosine deamination were estimated using PMDtools (table S1) (62).We have applied two methods based on the mitochondrial DNA (mtDNA) to estimate contamination in our samples (63,64).In addition, we used the "Contamination" program in ANGSD (65) v.0.911 to estimate X chromosome contamination in males, as described in (66).Contamination estimates are shown in table S2.

Metagenomic analysis
Metagenomic analysis of the DNA libraries was performed using aMeta v.1.0.0 (24) which yielded authentication scores for each detected microbial organism and authentication plots.We then retained microbes with an authentication score of ≥8 as a proxy for true presence and ancient status.We assigned the status of environmental contaminants to microbes from various isolation sources, including soil, storage, and lab contamination referenced in the BacDive database (67) or presence in more than 1% of soil samples (excluding farm and unknown types) from the Microbe Atlas Project Database (68).Several bacteria were further excluded on the basis of the literature (69,70) or due to insufficient information.We further classified the remaining microbes as pathogens or as part of the human oral microbiome, the latter based on the expanded Human Oral Microbiome Database and the literature (26)(27)(28).To evaluate Y. enterocolitica's pathogenicity, we aligned it to its nearest reference genome per KrakenUniq (25) (type O:5, strain YE53/03, reference: NZ_HF571988.1)and visualized the coverage of genes of virulence inv, yst, and myf using Integrative Genomics Viewer (IGV; figs.S15 to S17) (71).
In addition, we assessed the presence of pathogens below aMeta default thresholds by calculating k-mers to read ratios, using this as a proxy for evenness of coverage and filtering for known pathogenic species.After alignment with bwa aln v.0.7.17 and removal of reads shorter than 30 bp and duplicates, we generated coverage, edit distance, and damage pattern plots using aDNA-BAMPlotter (72) and MapDamage v.2.2.2 (73) for the pathogen hits showing the best ratios, namely, B. recurrentis and variola virus.
Last, we confirmed bacterial pathogens identification through competitive mapping (table S7).For the species detected with aMeta, we extracted species-specific reads with MaltExtract v.1.7 with the options --destackingOff and --downSampOff to retain all reads (74).We then aligned each pathogen hit with Bowtie 2 v.2.5.2 (75) against databases containing the first representative assembly from NCBI for each species of their respective genus.
To generate a phylogenetic tree for Erysipelothrix species, we first extracted all reads from seven human samples that were classified as E. rhusiopathiae with MaltExtract.Next, we downloaded all available E. rhusiopathiae (n = 46), Erysipelothrix inopinata (n = 1), and Erysipelothrix tonsillarum (n = 1) assemblies (not including the ones removed by NCBI) (download: 19 March 2024) and aligned the modern genomes together using Sibeliaz v1.2.5 (85) (-k15).The extracted reads were mapped to their closest reference genome (based on the result of the competitive mapping), namely, E. rhusiopathiae (GCA_900637845) for individuals 7A, 32, 28, 34, and 37; E. inopinata (GCF_014396165) for individual 43; and E. tonsillarum (GCF_000373785.1)for individual 14.Mapped reads were filtered using the same parameters as for aVARV.The final alignment consisted of 2,415,066 sites and 41,700 parsimonyinformative sites.The number of sites covered by the seven ancient Erysipelothrix candidate species ranged from 8505 (individual 14) to 1,357,672 (individual 37) sites.Last we generated a maximum likelihood tree using IQ-Tree v2.2.2.6 (83), with 1000 UFboot and SH-aLRT bootstraps (options -bb 1000 -alrt 1000) (83).Values of UFboot ≥ 95% and SH-aLRT ≥ 80% were considered as reliable as recommended by IQ-Tree authors.The implemented program ModelFinder (84) identified the best substitution model according to Bayesian Information Criterion (BIC) as TVM + F + I + R4 (option -m MFP).To rule out an attraction effect between the five Las Gobas E. rhusiopathiae hits, a phylogeny of each of them was inferred.This was performed by building five independent phylogenies with the same parameters as the general phylogeny but including only one Las Gobas E. rhusiopathiae sequence per phylogeny (see all newick files in table S8).

Population genomics Datasets
We used TrimBam of BamUtils (82) to trim 10 bp at the end of each read in our ancient individuals to remove the damage patterns characteristic of ancient and degraded DNA.Next, we merged the ancient samples with two different datasets from (23) version 54.1.p1:(i) "1240K+HO" and (ii) "1240K."In the ADMIXTURE and PCA analyses, we use a pseudohaploidized version of the dataset to mitigate bias when analyzing both haploid (ancient) and diploid (modern) individuals.
We applied additional filtering for the PCA analyses, excluding five individuals with genome coverage < 0.17 (table S1), resulting in 28 remaining individuals.In the ADMIXTURE and f-statistic analyses, we further exclude nine individuals; one with the lowest coverage from each pair of genetically related individuals with a k 0 value < 0.8 and pi_HAT > 0.06 (~third-degree kinship) (table S2).Therefore, 19 unrelated individuals were used in the ADMIXTURE and f-statistic analyses.

Mitochondrial haplogroups
We used Samtools v. 1.17 (59) to filter out mtDNA reads with mapping and base quality of minimum 30.Next, we used bcftools to call the consensus and generated fasta files using consfastQ2fasta.py.Last, we use HaploGrep 2.1.16(86,87) to assign mtDNA haplogroups based on PhyloTree Build 17 phylogeny (table S2) (88).

Y chromosome haplogroups
We determined Y chromosome haplogroups in the male individuals using pathPhynder v.1a (89).The "BigTree" Y chromosome dataset (included with pathPhynder) was used as the reference phylogeny, minimum base quality was set to 30, and maximum tolerance was set to 100.Both "default" and "transversions" filtering modes were used, which generated comparable results (table S3).

Regions of homozygosity
We used the hapROH software (22) to compute and plot the ROH, adhering to the guidelines provided by the authors (https://pypi.org/project/hapROH/).

Kinship analysis
We inferred genetic kinship for all pairs of the Las Gobas individuals using genomic data in combination with additional information (e.g., uniparental markers) and using four different methods: lcMLkin (90), KIN (91), NgsRelate (92), and LocalNgsRelate (93) as described below.We jointly analyzed the results of the four software to assess consistency among the outcomes, using this as a confidence measure for kinship estimation.We used a SNP panel of 1,681,497 transversion SNPs from the Estonian Genome Diversity Project (EGDP) (94) for the autosomal kinship analysis.For X chromosome-based kinship estimation, we again included only transversion SNPs from the EGDP dataset, with at least 5% minor allele frequency (MAF) and filtered the pseudo-autosomal regions from the X chromosomes as described in the human reference genome (hs37d5).After this filtering, 74,045 SNPs remained.lcMLkin We estimated kinship relations between the individuals analyzed in this study using the software lcMLkin (90) and our BAM files.We selected the SNPs from the EGDP (94) with a MAF greater than or equal to 0.15 and using only the transversions to avoid postmortem damage bias in our ancient samples.The genotype likelihoods of the selected SNP positions were called with "SNPbam2vcf.py"(90) using the population allele frequencies from the Las Gobas individual plus 104 present-day Spanish individuals (IBS) from the 1K genomes phase 3 (95).Following the developer recommendations, we first run lcMLkin assuming all individuals are unrelated to detect pairs of related individuals.Then, we removed all related and ancient individuals and run it again to obtain more reliable estimates of kinship coefficients using the founder flag -u with only the modern Spanish individuals.

Fig. 1 .
Fig. 1.Map of the Iberian Peninsula.in red, the location of the necropolis of las Gobas.city names are shown in italics.(A) the iberian Peninsula around the year 569 ce.(B) the different kingdoms and border with the caliphate of córdoba in the 10th century [maps based on (103)].

Fig. 3 .
Fig. 3. Runs of homozygosity (ROH).(A)ROh analysis for individuals from las Gobas, exhibiting cumulative ROh lengths exceeding 4 cM and coverage of more than 400,000 SnPs from the 1240K dataset(23).the numbers on the x axis represent individual codes.(B) ROh lengths and expected densities of ROh for certain degrees of parental relationships(22) for the las Gobas individual with the highest sum of inferred ROh.

Fig. 4 .
Fig. 4. Reduced maximum likelihood tree of variola virus.All samples were mapped to the consensus sequence of aVARV from a Viking from Öland (VK382).taxon name fields indicate site/clade, three-letter country code, and date/time.historical variola virus is shown in purple, ancient VARV in blue, and the ancient VARV from las Gobas in red.We collapsed the modern clades and kept the closest outgroups: the taterapox virus (tAtV) and the camelpox virus (cMlV) (for full tree, see fig.S22).node confidence values (bootstrap support) are shown at each node.the scale bar shows the average number of nucleotide substitutions per site.

Fig. 5 .
Fig. 5. PCA of Las Gobas individuals.Genetic data from las Gobas (n = 28) and other ancient individuals projected onto the first two principal components of modern north Africans and West eurasians.

Fig. 6 .
Fig. 6.Ancestry components.(A) AdMiXtURe analysis at K = 5 of 19 unrelated individuals and a selection of modern populations from the 1240K+hO dataset together with published ancient genomes (table S10).(B) A bar plot illustrating qpAdm results depicting the composition of european ancestry, represented by northern iberian Visigoths or iron Age individuals from north iberia, and north African ancestry, represented by canary islands aborigines, across 33 individuals from las Gobas.the individuals are ordered by id and radiocarbon age on the x axis.the y axis represents qpAdm values (tableS11).european and north African components are distinguished by color (purple and orange, respectively).For individuals 35 and 47, iron Age populations from north iberia were used instead of northern iberian Visigoths as the latter did not provide a feasible model.error bars indicate the upper and lower Ses. the P value for each analysis is displayed on the plot.

Fig. 8 .
Fig. 8. f3-statistic MDS plot.Pairwise genetic distance matrix from inverted f3 values (1-f3) of the form f3 (ancient group A and ancient group B; Mbuti) between ancient iberian groups from different periods and cultures (goodness of fit r 2 = 0.999).
Present-day p po op pu ul la at ti io on ns s Middle East & Caucasus North Africa North Africa Jew Europe Spanish A An nc ci ie en nt t p po op pu ul la at ti io on ns s Pre−European Canary Islands